Goto

Collaborating Authors

 small weight



Pay Attention to Small Weights

Zhou, Chao, Jacobs, Tom, Gadhikar, Advait, Burkholz, Rebekka

arXiv.org Artificial Intelligence

Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights. This correlation is more pronounced in finetuning settings than in training from scratch. Motivated by this observation, we propose NANOADAM, which dynamically updates only the small-magnitude weights during finetuning and offers several practical advantages: first, this criterion is gradient-free -- the parameter subset can be determined without gradient computation; second, it preserves large-magnitude weights, which are likely to encode critical features learned during pretraining, thereby reducing the risk of catastrophic forgetting; thirdly, it permits the use of larger learning rates and consistently leads to better generalization performance in experiments. We demonstrate this for both NLP and vision tasks.



Reviews: Extracting Relationships by Multi-Domain Matching

Neural Information Processing Systems

Title: Extracting Relationships by Multi-Domain Matching Summary Assuming that a corpus is compiled from many sources belonging to different to domains, of which only a strict subset of domains is suitable to learn how to do prediction in a target domain, this paper proposes a novel approach (called Multiple Domain Matching Network (MDMN)) that aims at learning which domains share strong statistical relationships, and which source domains are best at supporting to learn the target domain prediction tasks. While many approaches to multiple-domain adaptation aim to match the feature-space distribution of *every* source domain to that of the target space, this paper suggests to not only map the distribution between sources and target, but also *within* source domains. The latter allows for identifying subsets of source domains that share a strong statistical relationship. Strengths Paper provides a theoretical analysis that yields a tighter bound on the weighted multi-source discrepancy. Weaknesses Tighter bound on multi-source discrepancy depends on the assumption that source domains that are less relevant for the target domain have lower weights.


Neural Computing with Small Weights

Neural Information Processing Systems

An important issue in neural computation is the dynamic range of weights in the neural networks. Many experimental results on learning indicate that the weights in the networks can grow prohibitively large with the size of the inputs. Here we address this issue by studying the tradeoffs between the depth and the size of weights in polynomial-size networks of linear threshold elements (LTEs). We show that there is an efficient way of simulating a network of LTEs with large weights by a network of LTEs with small weights. In particular, we prove that every depth-d, polynomial-size network of LTEs with exponentially large integer weights can be simulated by a depth-(2d 1), polynomial-size network of LTEs with polynomially bounded integer weights.


A Tighter Bound for Graphical Models

Leisink, Martijn A. R., Kappen, Hilbert J.

Neural Information Processing Systems

The neurons in these networks are the random variables, whereas the connections between them model the causal dependencies. Usually, some of the nodes have a direct relation with the random variables in the problem and are called'visibles'. The other nodes, known as'hiddens', are used to model more complex probability distributions. Learning in graphical models can be done as long as the likelihood that the visibles correspond to a pattern in the data set, can be computed. In general the time it takes, scales exponentially with the number of hidden neurons.


A Tighter Bound for Graphical Models

Leisink, Martijn A. R., Kappen, Hilbert J.

Neural Information Processing Systems

The neurons in these networks are the random variables, whereas the connections between them model the causal dependencies. Usually, some of the nodes have a direct relation with the random variables in the problem and are called'visibles'. The other nodes, known as'hiddens', are used to model more complex probability distributions. Learning in graphical models can be done as long as the likelihood that the visibles correspond to a pattern in the data set, can be computed. In general the time it takes, scales exponentially with the number of hidden neurons.


A Tighter Bound for Graphical Models

Leisink, Martijn A. R., Kappen, Hilbert J.

Neural Information Processing Systems

Theneurons in these networks are the random variables, whereas the connections between them model the causal dependencies. Usually, some of the nodes have a direct relation with the random variables in the problem and are called'visibles'. The other nodes, known as'hiddens', are used to model more complex probability distributions. Learning in graphical models can be done as long as the likelihood that the visibles correspond to a pattern in the data set, can be computed. In general the time it takes, scales exponentially with the number of hidden neurons.


On Neural Networks with Minimal Weights

Bohossian, Vasken, Bruck, Jehoshua

Neural Information Processing Systems

Linear threshold elements are the basic building blocks of artificial neural networks. A linear threshold element computes a function that is a sign of a weighted sum of the input variables. The weights are arbitrary integers; actually, they can be very big integers-exponential in the number of the input variables. However, in practice, it is difficult to implement big weights. In the present literature a distinction is made between the two extreme cases: linear threshold functions with polynomial-size weights as opposed to those with exponential-size weights.


On Neural Networks with Minimal Weights

Bohossian, Vasken, Bruck, Jehoshua

Neural Information Processing Systems

Linear threshold elements are the basic building blocks of artificial neural networks. A linear threshold element computes a function that is a sign of a weighted sum of the input variables. The weights are arbitrary integers; actually, they can be very big integers-exponential in the number of the input variables. However, in practice, it is difficult to implement big weights. In the present literature a distinction is made between the two extreme cases: linear threshold functions with polynomial-size weights as opposed to those with exponential-size weights.